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We present a simple game model where agents with different memory lengths compete for finite 
resources. We show by simulation and analytically that an instability exists at a critical memory 
length, and as a result, different memory lengths can compete and co-exist in a dynamical equilib¬ 
rium. Our analytical formulation makes a connection to statistical urn models, and we show that 
temperature is mirrored by the agent’s memory. Our analysis is easily generalisable to many other 
game models with implications that we briefly discuss. 
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Introduction - All successful forms of life must eventu¬ 
ally engage in competition for resources. The equilibrium 
analysis of these competitions began with von Neumann 
[I[ and Nash [dj- The theory of games has since found ap¬ 
plications in genetics, ecology, economics and sociology 
[3|-t6|. Computational implementation of games leads to 
agent-based models, which may be of particular impor¬ 
tance in understanding the behaviour of financial systems 
@, @]. For example, the particularly successful minority 


game model [8(-11(], captures the competition between in¬ 
telligent agents with a restricted form of memory. Recent 
work suggest such games may be generalised leading to 
clearly separated regimes of behaviour [12]. In general, 
understanding the complex collective behaviour arising 
from the non-linear interactions between individuals is a 
major challenge for statistical physics 0 . 


In this Letter we present a simple game model: in each 
round, individual agents pick one of two urns, each of¬ 
fering a stochastic yield to be shared by the pickers. An 
agent has a memory of these payouts for the previous r 
rounds to aid its decision. Such stochastic yield shar¬ 
ing arises in animal foraging behaviour and stock trad¬ 
ing EH, where the urns may represent different prey 
species, foraging patches, or stocks. Some form of in¬ 
telligence is essential in order to compete [16]. As with 
the minority game Q, agents’ memory in our model is 
a tool for decision making. However, unlike the minor¬ 
ity game, memory in our model is used to make direct 
estimates of the highest paying choice, rather than to 
second guess opponents’ next moves. In common with 
the thermal minority game 0, dynamical urn models 
[b§], and some evolutionary games 0, our agents have 
a “temperature”, which captures the level of noise in the 
switches they make in search of yields. We find that the 
additional noise inherent in their finite memory samples, 
which is greater for shorter memories, leads the system to 
behave as if its agents have a higher temperature. There¬ 
fore, increasing memory “cools” the system. However, at 
a critical memory, a Hopf bifurcation 10] emerges pro¬ 
ducing stable cycles in the numbers of agents in each 
urn. Perhaps not surprisingly, a long memory is advan¬ 


tageous, but the presence of these cycles allows short 
memory agents to compete, and a mixed memory system 
will evolve toward the bifurcation point. Our theoretical 
formulation follows that of statistical urn models such 
as Ehrenfest’s dog flea model which played an im¬ 
portant role in the early development of statistical me¬ 
chanics, and more recently allowed analytical investiga¬ 
tion of effects such as slow relaxation and condensation 
in nonequilibrium statistical mechanics 0]. 

Whilst we have restricted this Letter to a yield sharing 
game, the analysis may be carried through for any so¬ 
cial system where agents switch between strategies using 
their memory to determine the optimal choice. Simple 
examples include the Hawk Dove Q and Rock Scissors 
Paper [22|] games. Our formulation could also accom¬ 
modate the inclusion of topological effects and agent in¬ 
teractions [23l ]. However, even without such complexities, 
urn models exhibit a remarkable range of non-equilibrium 
behavior that is connected with temperature [l8j. Our 
analysis suggests that effects such as condensation and 
the emergence of order [24^ . which have social interpre¬ 
tation, may also have a connection with memory [23], but 
that long memory can also introduce instability. 

Model definition - Consider the case of two urns and 
a total of n agents. We let the urn yields, Ui[t),U- 2 [t), 
at round t be random variables uniformly distributed on 
[0,wrt] and [0,n] respectively, where w > 1 so that urn 
1 yields more on average than urn 2. We allow agents 
access to the arithmetic mean of the last r payoffs, but we 
note that other forms of sampling could be used. Letting 
fit be the fraction of agents in urn 1, then the difference 
in the average payoffs between urn 1 and urn 2 is 


A t 


1 

T 
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X 


U2(t - S) 
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Ui(t-s) 
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( 1 ) 


We refer to r as the “memory length” of the agents. 
Agent dynamics is encoded in transition probabilities be¬ 
tween urns, which are deterministic functions of A*. At 
each round, each agent will switch urns using the proba- 
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FIG. 1. Evolution of (j>t when n = 10 6 , uj = 2, /3 = 10 and 
t = ICG 6 . Memory values are r G {5,10,50,500} (squares, 
circles, dots, triangles). Dashed lines are analytical equilib¬ 
rium values (see equation Gi). 



FIG. 2. Evolution of (j>t when n = 10 6 , ui = 2, /3 = 5 
and e = 1(G 3 . Memory values are r G {5, 50, 500} (triangles, 
squares, circles). Dashed line is solution to equation (11.5ft 
when r = 500 and oj, /3, t are as above. 


bilities 

Wi_> 2 (A) = | [1 + tanh(jSA)] ( 2 ) 

WWA) = | [1 - tanh(/3A)]. (3) 

The parameter /?, the “inverse temperature”, captures 
the degree of stochasticity with which agents make 
choices. For finite /?, agents may decide to switch strate¬ 
gies even though their estimate of the payoff difference is 
unfavourable. In the limit (3 —» oo agents will only move 
if their estimate of the payoff difference indicates that 
the move is favourable. The parameter e controls the 
rate at which strategy switching takes place compared 
to the rate at which yield information arrives. It may 
also be seen as the frequency with which opportunities 
to switch strategy arise. In the limit e —> 0, at most one 
agent will move at each round. 

Simulation (instability) - We simulate the model for a 
series of values of r when n = 10 6 . Two different values 
of e are used; in Figure [l] we have e _1 = 10 6 r and in 
Figure [2] we have e = ICG 3 . For e = ICG 6 , the expected 
number of moves at each step is < 1 , and (j> appears very 
stable. For larger e, <fi experiences much larger fluctua¬ 
tions about the steady state value, driven by the yield 
process. For shorter memory values these fluctuations 
are random, but as r approaches e^ 1 , periodic oscilla¬ 
tions appear and dominate. The appearance of these 
stable oscillations at critical memory, r c , is known as a 
Hopf bifurcation [20|]. By allowing agents access only 
to the mean of their memory, we implicitly assume that 
changes in the expected payoff over the course of their 
memory, brought about by oscillations, are too subtle 
for them to infer from noise. 

Simulation (coexistence) - We now investigate how 
agents with two different memories compete against one 
another by interpreting the payoff as reproduction rate. 
We define <5 and 7 as the rates of death, and reproduction 


per unit payoff, respectively. Reproduction is assumed 
to occur before death in each round, but in practice the 
probability of any one agent reproducing and dying in 
the same round is extremely small for the 7 , S values we 
choose. Letting p[(t) be the number of agents with mem¬ 
ory r in urn i at time t we set the probability of birth for 
an agent in urn i to be 


P (birth) = 


7 Ui(t) 

E rPittY 


(4) 


The death probability for each agent is set equal to 6. 
If populations are fixed in size and the system is not 
in an oscillatory state, then we expect that in equilib¬ 
rium the longer memory agents will dominate the high 
yielding urn. Their long memory allows them to per¬ 
ceive smaller statistical advantages that are obscured by 
noise for the short memory agents. Using the thermody¬ 
namic analogy, the higher temperature (shorter memory) 
agents are more likely to make moves which leave them 
in an urn with a lower expected payoff, corresponding to 
a higher “energy” state. Above zero temperature, and 
in the absence of oscillations, the high yield urn will be 
under-exploited, placing high memory agents at an ad¬ 
vantage. This effect can be observed in Figure [3] where 
we have simulated a mixed population of two memories 
r G {10,1000} beginning with a ratio of 10:1 short mem¬ 
ory to long memory agents. We see that initially the 
advantage afforded the long memory agents causes their 
population to grow, whereas the short memory agents re¬ 
duce in number. Were this advantage to be sustained in¬ 
definitely then we would expect the short memory agents 
to eventually disappear, but in fact the populations stabi¬ 
lize. This effect appears because the long memory agents 
cause oscillations to develop once they are in sufficiently 
high concentration. In the presence of oscillations the 
short memory agents have an advantage because they can 
quickly observe opportunities offered by the oscillating 
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FIG. 3. Scaled populations p(t) := pi(t) + p£(i) f° r t = 10 
(circles) and r = 1000 (triangles) when n = p( 0 ) = 10 6 , 
t = 10“ 3 , 0 = 4 with 7 = 1(T 4 and <5 = 2 x 10~ 4 . Also 
shown (thin black line) is evolution of variance of <pt during 
population dynamics simulation. Straight dashed line shows 
variance of homogeneous population with the same e, 0, lo val¬ 
ues at critical memory r c . Note: rapid initial equilibration of 
population values (bringing birth and death into balance) is 
not visible on time scale of plot. 


payoffs. We therefore expect the system to evolve to the 
point where oscillations are just beginning to form. We 
may observe this evolution by making use of the variance 
of <f>t as an order parameter which captures proximity to 
the Hopf bifurcation point. In Figure [3] we see that at a 
critical ratio of short to long memory agents, the variance 
climbs rapidly, stabilizing just below the value seen in a 
system where all agents have memory r c but all other 
parameters are equal. In this way the Hopf bifurcation 
may be viewed as a self organized state. 

Analysis (equilibrium) - We consider the behaviour of 
the model as e —> 0 , allowing us to view it as an urn model 
in the Ehrenfest class [18] where agents independently 
make transitions using state (</>*) dependent probabilities. 
Provided r <C e” 1 , the fraction <j> t may be approximated 
by a constant <j> during the window over which payoff 
averaging takes place. In this case, by the central limit 
theorem, the marginal distributions of A t for each t are 
approximately normal N(A,a 2 /r) where, from ]T]) 


A(0,w) 

cr 2 (0,w) 


1 

2 


1 


1 

12t 




(5) 

( 6 ) 


We now introduce a intermediate time scale T satisfying 
r <C T <C e _1 and define the time average (•), over a 
window of length T 


<Wi-i(A))(i) := 


s=t-T+l 


(7) 


This average is a random variable which, for constant </>, 
has expected value E[W^ (A)] where the expectation is 


taken over the marginal distribution of A. The condition 
r -C T -C e ^ 1 ensures that (f> is approximately constant 
over the window and that the variance of ((A)) is 
proportional to T 1 (because A tl and A t . 2 are depen¬ 
dent only when |f 2 — ii| < r <C T). As e —> 0, then 
assuming T is sufficiently large, the probability that an 
agent will make a transition i —»• j during interval T 
approaches T(Wj_^(A)) « TE[Wj_^(A)], equivalent to 
a memoryless (Ehrenfest class) model where transition 
probabilities m are replaced with their expectations 
E[Wi_yj(A)]. Averaging over the normally distributed 
difference A we find that 

<Wi_, 2 (A)> « E[W^ 2 (A)] « | [1 + tanh(oA)] (8) 
where 


a = 


2 T0 2 


2 r + Tr0 2 a 2 


(9) 


To obtain this result, we have made the approximation 
tanh(/3A) ss erf( v / 7 r/ 3 A/ 2 ), allowing us to make use of 
the exact relationship E[erf(- v / 7 r^A/ 2 )] = erf(i/ 7 raA/ 2 ). 
The constant a acts as an effective inverse tempera¬ 
ture and we see that increasing r “cools” the system 
closer to the inverse temperature 0, and in the limit 
0 —> oo, a oc y/r. To complete our analogy to a ther¬ 
mal urn model we now write the probability of finding 
the agents in a particular arrangement, or microstate, i, 
such that a fraction (p are in urn 1, as Pi(4>) oc e~ aE 
where E is an “energy” function. Considering two mi¬ 
crostates separated by a single transition, and defining 
S(f> = 1/n, then detailed balance requires that in equilib¬ 
rium 2aA = d t j > (aE)S<p. This condition allows E(<f>) to 
be computed, in principle, by integration. A closed form 
approximation E((j>) « —n ln[<^> w (l — </>)] is obtained by 
noting that a depends weakly on <f> compared to E so 
that d(f,{aE) « ad^E. Summing over all microstates cor¬ 
responding to macrostate <f> we have a Boltzmann prob¬ 
ability distribution for <p 


P(0) 


n! e - a {<t>)E(<t>) 

(n(j>)\(n(\ — </>))! Z 


( 10 ) 


where Z is the partition function. Taking the thermody¬ 
namic limit n —> oo, and making use of Stirling’s approx¬ 
imation, we find that the most likely (maximum entropy) 
fraction, </>, satisfies: 

i-^lnP(0)=aA-2</»+l = O. (11) 

As the memory increases and the system cools we ex¬ 
pect the agents to arrange themselves so that yields are 
shared more fairly. We therefore linearize m about the 
perfectly fair state, (f> = u>/( 1 +w), where agents in both 
urns receive the same expected payoff, finding that 


I ^( T )+ 

q> ~ - 

2 /(» A 
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(3(l+u) 3 ? 


2c o 


( 12 ) 
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where /(r) = y/1 + 7 r/ 3 2 (1 + w) 2 /(12r). The accuracy 
of this approximation is verified in Figure [l] For larger 
values of e (Figure [2]) agents move more quickly so the 
averaging effect © damps fluctuations in transition rates 
less strongly, creating larger fluctuations in <j> t . For finite 
(j the system cannot reach perfect fairness for any mem¬ 
ory length, but in the limit /3 —> oo where the transition 
probabilities m become step functions, we have that: 


u> 


ui + 1 


- 1 ) 
V / 3r(w + l ) 2 




(13) 


From this we see that the distance away from the fair 
state decreases as r -1 / 2 as the memory of the agents 
becomes large. However, we now show why increasing r 
too far, when e is finite, destabilizes the system. 

Analysis (instability) - As r increases, fluctuations in 
A t due to the yield process are reduced but for finite e we 
can no longer treat </> t as a constant over the averaging 
window. It is instructive, therefore, to study the effect of 
variations in (f> t , neglecting the variations in yield. Pro¬ 
moting 1 to a continuous variable and replacing the urn 
yields with their mean values we have 


A t 


1 

27 


r 

1 LO 

L 

. 1 0s 0s _ 


ds. 


(14) 


We then approximate the evolution of </> t using the fol¬ 
lowing delay differential equation: 

j>t = (1 - &)W 2 -n(At) - &Wi_> 2 (A t ). (15) 


A numerical solution to this equation is shown in Figure 
[ 2 j along with simulation results using the same parameter 
values. The oscillations in the simulation are accurately 
captured by (fTKl) but the stochastic yield disrupts their 
perfect periodicity. To discover the parameter values at 
which stable oscillations develop we linearize equation 
(1151) by writing (f> t = <f> + ip t where tp t are small fluctu¬ 
ations and <f> is the constant fixed point, not necessarily 
stable, of equation (fl5l) . In terms of these new variables 

A t « A(4>, w) + 6 —j i /j sC i s (16) 

T Jt-r 


where the functions A and a 2 are defined in m and 
©• After expanding the tanh functions in the transition 
rates to first order about A(0, w), we obtain the following 
linear delay equation 


ipt = -e 




(17) 


where A = 3/3sech 2 [/?A((j6, uj)\a 2 (<p, y/w). To determine 
the stability of this equation we introduce an exponential 
trial solution ip t = e xt where A = x + iy. Substitution 
into equation m yields a characteristic equation with 


real and imaginary parts given by 

x 2 — y 2 + ex + — (l — e~ TX cos ry) = 0 (18) 

eA 

2xy + ey H- e~ TX sinry = 0. (19) 

T 

For sufficiently small memory, r, the real part, x , of the 
solutions to (fl8l) and (fl9l) is negative so the fixed point (p 
is stable. As we increase r, A crosses through the imagi¬ 
nary axis, creating a switch to instability with oscillations 
of exponentially increasing magnitude. Although the full 
equation (1151) shares this transition to instability, we find 
that the resulting oscillations are bounded. The appear¬ 
ance of these stable oscillations as r passes through a 
critical value, which we denote r c , constitutes the Hopf 
Bifurcation [20]. To compute t c we set x = 0 in equation 
(HU) so that sinc(ry) = A A Expanding the sine func¬ 
tion to second order about its root at 7r/r and solving 
the resulting quadratic we find that 

« “ Tr ( 3 - A 3 ^) := - T m 


which defines a new constant k. Substitution of this so¬ 
lution into m, yields the following expression for the 
critical memory length 

K 2 

Tc eA(l — cosk)’ 

For example, for the parameter values used in Figure [2j 
we have r c ~ 400, whereas the relevant critical value for 
Figure [T] is t c = 1.8 x 10 5 . These values are in excellent 
agreement with simulations. 

Conclusion- We have introduced a simple thermal urn 
model of competition between agents with memory. In¬ 
creasing memory allows agents to more accurately de¬ 
termine the most productive strategy, and reduces the 
temperature of the model. However, if a sufficiently high 
concentration of long memory agents is present a limit 
cycle appears which reduces the competitiveness of long 
memory agents, leading to self organized Hopf bifurcation 
in a mixed memory model. The simplicity of our model, 
its connection to classical urn models, together with the 
fact that limit cycles arise naturally suggest it might be 
fruitfully generalized, and employed to study a variety of 
different games. For example our approach may be ap¬ 
plied to the Rock Scissors Paper game Q, where agents, 
interacting pairwise, recall their last r interactions. Al¬ 
though a larger memory provides better statistical data 
on the optimal strategy, at critical memory a limit cycle 
emerges about Nash equilibrium, destroying this compet¬ 
itive advantage [27j . Other natural extensions include the 
introduction of multiple urns to represent, for example, 
different financial stocks. In this case^e would expect 
more complex patterns of oscillation [2tii |. Experimen¬ 
tal research into the nature of human and animal mem¬ 
ory [28l - [3l| places emphasis on the “forgetting function” 
which describes how memories decay with time. Such a 
function, or greater powers of statistical inference, could 
be naturally incorporated into our analysis, and its ef¬ 
fects on stability explored. 
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